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Description 

Field of the Invention 

5 [0001] This invention relates to the field of nucleic acid regulatory elements that affect mRNA translation, export, 
and stability. More specifically, the invention relates to the screening of 5' and 3' untranslated RNA sequences, the 
identification of RNA regulatory elements within these sequences, and the identification of compounds that modulate 
the function of these RNA regulatory sequences. 

10 Background 

[0002] While transcriptional controls regulate gene expression by influencing the rate of mRNA production, post- 
transcriptional mechanisms can also regulate gene expression by modulating the amount of protein produced from an 
mRNA molecule. For example, gene expression can be regulated by altering mRNA translation efficiency (Izquierdo 

15 and Cueza, Mol. Cell Biol. 1 7: 5255-5268, 1 997; Yang et al., J. Biol. Chem. 272: 15466-73, 1 997), or by altering mRNA 
stability (Ross, Microbiol. Rev. 59: 423-50, 1 995). Post-transcriptional control mechanisms appearto play an especially 
important role in the gene expression response to environmental factors, such as response to heat shock (Sierra et 
al., Mol. Biol. Rep. 19:211-20, 1994), iron availability (Hentze et al., Proc. Natl. Acad. Sci. USA 93: 8175-82, 1996), 
oxygen availability (Levy et al., J. Biol. Chem. 271:2746-53, 1996; McGary etal., J. Biol. Chem. 272: 8628-34, 1997), 

so and growth factors (Amara et al., Nucleic Acids Res. 21 : 4803-09, 1993). 

[0003] Post-transcriptional regulatory elements may be present in the 5' and 3' mRNA untranslated regions (UTRs). 
At the 5' UTR, mRNA binding to ribosomes is generally the rate-limiting step in translation initiation (Mathews et al., 
In: Translational Control, pages 1-30, Eds: Hershey et al., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, 
NY, 1996). At the 3' UTR, regulatory elements may modulate mRNA translation and degradation, as well as mRNA 

25 transport and subcellular localization (Jackson, Cell 74: 9-14, 1993). However, the nature of most UTR post-transcrip- 
tional elements remains poorly understood. A method for efficiently characterizing these mRNA regulatory sequences 
would advance the discovery of compounds that modulate expression of therapeutically important proteins via regu- 
latory mRNA sites, 

30 Summary of the Invention 

[0004] We have discovered a method for constructing libraries that are specifically biased for RNA regulatory sites. 
In the first aspect, the invention features a cDNA library consisting essentially of at least 1 00 different cDNA sequences 
that correspond to different mRNA untranslated region (UTR) sequences isolated and separate from adjacent mRNA 

35 coding sequences. Preferably, the cDNA sequences are cloned into a vector system that can express the sequences, 
and such a vector is also a feature of this invention. This vector includes the following: a) a nucleotide sequence 
encoding an mRNA UTR sequence in operative linkage to a promoter, wherein the nucleotide sequence is derived 
from the cDNA library of the first aspect; b) a first reporter gene positioned for transcription upstream or downstream 
of the UTR-encoding nucleotide sequence; and c) a second, different reporter gene in operative linkage to a promoter 

40 but unassociated with the UTR-encoding nucleotide sequence. Preferably, the reporter genes encode a fluorescent 
protein or cell surface marker protein. 

[0005] A second and related aspect of the invention features a cDNA library, wherein the library is constructed by 
steps that include the following: a) purifying poly(A)+ RNA from total RNA; b) performing controlled, non-random en- 
zymatic digestion of AUG sequences in the poly(A)+ RNA; c) purifying the digested RNA to obtain the fragments 
45 containing the 5' end sequences; and d) synthesizing cDNA from the purified RNA obtained in step (c); wherein the 
library consists essentially of cDNA sequences corresponding to mRNA 5' untranslated region (UTR) sequences, iso- 
lated and separate from adjacent mRNA coding sequences. Preferably, the enzymatic digestion is carried out using 
RNase H. 

[0006] In a third aspect, the invention features a cDNA library constructed by steps that include the following: a) 
so purifying poly(A)+ RNA from total RNA: b) synthesizing nucleic acid heteroduplexes from the poly(A)+ RNA, using 
degenerate primers that hybridize preferentially to the region surrounding and including the initiation codon, where the 
heteroduplexes comprises the 5' end sequences of the RNA; c) purifying the heteroduplexes obtained in step (b) to 
obtain the fragments containing the 5' end sequences; and d) synthesizing cDNA from the purified heteroduplexes 
obtained in step (c): wherein the library consists essentially of cDNA sequences corresponding to mRNA 5' untranslated 
55 (UTR) sequences, isolated and separate from adjacent mRNA coding sequences. 

[0007] In one embodiment of any of the above three aspects of the invention, the cDNA library consists essentially 

of cDNA sequences corresponding to mRNA untranslated region sequences, isolated in intact form. 

[0008] In preferred embodiments of the second or third aspects of the invention, the 5' sequence purification is carried 
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out using a cap binding protein, for example, an elF4E fusion protein or an antibody to the 5' cap, and the cDNA 
sequences are cloned into a vector system that can express the sequences. This vector includes the following: a) a 
nucleotide sequence encoding an mRNA UTR sequence in operative linkage to a promoter, wherein the nucleotide 
sequence is derived from the cDNA library of the second or third aspect; b) a first reporter gene positioned for tran- 
5 scription upstream or downstream of the UTR-encoding nucleotide sequence; and c) a second, different reporter gene 
in operative linkage to a promoter but unassociated with the UTR-encoding nucleotide sequence. Preferably, the re- 
porter genes encode a fluorescent protein or cell surface marker protein. 

[0009] A related fourth aspect of the invention is a cDNA library, wherein the library is constructed by steps that 
include the following: a) purifying poly(A)+ RNA from total RNA; b) performing random digestion on the poly(A)+ RNA; 
10 c) purifying the digested RNA to obtain poly(A) containing fragments; and d) synthesizing cDNA from the purified RNA 
obtained in step (c); wherein the library consists essentially of cDNA sequences corresponding to 3' UTR sequences, 
isolated and separate from adjacent mRNA coding sequences. 

[0010] A cDNA library is also featured in the fifth aspect of the invention. This cDNA library is constructed by steps 
that include the following: a) purifying poly(A)+ RNA from total RNA; b) loading the poly(A)+ RNA with ribosomes; and 

is c) performing reverse transcription on the loaded poly(A)+ RNA using an oligo(dT) primer and polymerase; 

wherein the library consists essentially of cDNA sequences corresponding to 3' UTR sequences, isolated and separate 
from adjacent mRNA coding sequences. Preferably, the cDNA sequences of the libraries of the fourth or fifth aspects 
are cloned into vector systems that can express the sequences, and such vectors are also a feature of this invention. 
These vectors include the following: a) a nucleotide sequence encoding an mRNA UTR sequence in operative linkage 

so to a promoter, wherein the nucleotide sequence is derived from the cDNA library of the fourth or fifth aspect; b) a first 
reporter gene positioned for transcription upstream or downstream of the UTR-encoding nucleotide sequence; and c) 
a second, different reporter gene in operative linkage to a promoter but unassociated with the UTR-encoding nucleotide 
sequence. Preferably, the reporter genes encode a fluorescent protein or cell surface marker protein. 
[0011] In one embodiment of the fourth or fifth aspect of the invention, the cDNA library consists essentially of cDNA 

25 sequences corresponding to 3' untranslated region sequences, isolated in intact form. 

[0012] A sixth aspect of the invention provides a method of identifying a regulatory UTR sequence that includes the 
following steps: a) transfecting a plurality of host cells with a plurality of vectors of the present invention, wherein the 
host cells are transfected with different UTR sequences; b) sorting cells on the basis of the ratio between expression 
of the first reporter gene and the second reporter gene; c) identifying the cells of step a) that have skewed expression 

30 ratios as compared to the population of cells of step (a) as a whole, or as compared to cells transfected with a vector 
that encodes the first and second reporter gene, but lacks the corresponding UTR sequence; and d) sequencing the 
UTR expressed in the identified cells. Preferably, the gene expression is detected by emission of fluorescence and the 
cells are sorted by a fluorescence activated cell sorter. 

[0013] The seventh and final aspect of the invention features a cell transfected with any of the vectors of the present 
35 invention. 

[0014] By "different mRNA untranslated region (UTR) sequences" or "different UTR sequences" is meant sequences 
that differ from each other in that they are derived from different mRNA species. As used herein, mRNA UTR sequences 
that are products of alternated splicing are considered to be different mRNA UTR sequences. 
[0015] By "controlled, non-random enzymatic digestion of AUG sequences" is meant preferentially digesting mRNA 

40 at the site of AUG sequences, for example, using RNase H and a mixture of degenerate AUG-complementary oligo- 
nucleotide 7-mers. under conditions that require hybridization of more than 5 consecutive base pairs for RNase sub- 
strate recognition. To preferentially digest the initiation-AUG sequences in an mRNA population, the 7-mers in the 
AUG-complementary oligonucleotide mixture used have frequencies of A, C, G, and T at each position that are com- 
plementary to the frequencies of A, C, G, and U occurring in all known vertebrate mRNA sequences between the -3 

45 and +4 position (where +1 is the first nucleotide of the coding sequence) (see, e.g., Table 1). 

[0016] By "UTR sequences isolated and separate from adjacent mRNA coding sequences" is meant the following: 
1)5' UTR sequences that begin at the 5' end of a transcribed mRNA and extend up to, but do not include, the translation 
AUG initiation site; and 2) 3' UTR sequences that begin at the mRNA nucleic acid in the position 3' adjacent to the 
translation termination site and extend to the poly(A) tail of the transcribed mRNA. Preferably, the UTR sequences are 

so isolated in intact form. 

[0017] By "random digestion" of poly(A)+ RNA is meant RNase digestion using, for example, RNase H and random 
primers to digest the RNA into smaller fragments at random sites. 

[0018] By "loading poly(A)+ RNA with ribosomes" is meant contacting the RNA population with ribosomes, for ex- 
ample, in a rabbit reticulocyte lysate, to allow for loading of the ribosomes onto the RNA. To maximize ribosome loading, 
55 a chemical that prevents ribosome runoff, for example, cycloheximide, can be included. 
[0019] By a "plurality" is meant more than one. 

[0020] By "skewed expression ratios" is meant a change in the ratio of expression of a first reporter gene that is 
associated with a specific UTR to expression of a non-UTR associated second reporter gene, as compared to the ratio 
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of expression of the first reporter gene that is not associated with the same UTR compared to expression of the non-UTR 
associated second reporter gene. 

[0021] The screening assay and the 5' and 3' mRNA untranslated region (UTR) biased cDNA libraries of the present 
invention have a number of advantages. The biased UTR libraries provide a collection of UTR sequences that are 

5 isolated and separated from any adjacent coding sequences. Thus, screening these libraries allows opportunity to 
screen essentially complete UTR sequences without interference from coding sequences. In addition, the quantity of 
sequences screened and the specificity of output can be modulated by controlling conditions that regulate the number 
of different plasmids that enter each cell. In most circumstances, the ideal number of plasmids per cell would be limited 
to one, thereby reducing signal dilution and the occurrence of false negative results. 

10 [0022] Other features and advantages of the invention will be apparent from the detailed description thereof and 
from the claims. 

Description of the Figures 

15 [0023] Fig. 1 demonstrates RNase H digestion of a control RNA sequence using specific or partially degenerate 
oligodeoxynucleotide 7-mers, under conditions that allow hydrolysis only if 6 or more consecutive base pairs are hy- 
bridized (compare lanes 4 and 5). 

[0024] Fig. 2 demonstrates RNase H digestion of a control sequence using two different sequence specific oligode- 
oxynucleotides, under conditions that allow hydrolysis only if 7 consecutive base pairs are hybridized (see lane 3). 
20 [0025] Fig. 3 shows RNase H digestion of poly(A)+RNA using a partially degenerate oligonucleotide 7-mer, under 
conditions that allow hydrolysis only if 7 consecutive base pairs are hybridized. The number of hydrolysis sites can be 
limited, even after extended incubation (compare lanes 6 and 7). 
[0026] Fig. 4 illustrates limited reverse transcription of 3' UTR sequences. 

25 Detailed Description 

[0027] The practice of the present invention employs conventional techniques in biochemistry, molecular biology, 
microbiology, and related fields that are known to those skilled in the art. These techniques are fully explained in the 
literature (see, e.g., Maniatis et al,, Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory Press 
30 (1982); Sambrook et al., Molecular Cloning: A Laboratory Manual, Second Edition, Cold Spring Harbor Laboratory 
Press (1989); and Ausubel, et al,, Current Protocols in Molecular Biology, John Wiley & Sons (1987-1996 ed.) 

Construction of UTR Libraries 

35 [0028] Poly(A)+ RNA is isolated from total cellular RNA, according to standard protocol (Aviv and Leder, Proc. Natl. 
Acad. Sci. USA 69: 1408-12,1972), 

[0029] To construct 5' UTR biased libraries, poly(A)+ RNA is subjected to controlled, non-random enzymatic digestion 
followed by size selection. The enzymatic digestion of the poly(A)+ RNA is carried out, for example, using E. coli RNase 
H in the presence of a 7-mer oligodeoxynucleotide mixture, wherein the sequences of the oligodeoxynucleotides have 
40 A, C, G, and T at frequencies of occurrence that are complementary to the frequencies of occurrence of A, C, G, and 
U in all known vertebrate mRNA sequences between the -3 and +4 positions of the mRNA (where position 1 of the 
oligodeoxynucleotide is complementary to position +4 on the mRNA and position 7 is complementary to position -3 on 
the mRNA; see Table 1). 

45 Table 1 . 



Designing Degenerate Oligonucleotides for the Isolation of 5' UTRs 
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[0030] Given that E. coli RNase H requires hybridization of four consecutive base pairs in order to recognize a DNA/ 
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RNA duplex region as a substrate (Donis-Keller, Nucleic Acids Res. 7: 179-192, 1979), the controlled RNaseH digestion 
using the above-described oligodeoxynucleotides will primarily hydrolyze the initiation codon, but because of the de- 
generacy of the oligodeoxynucleotide mixture, and the minimum consecutive number of base pairs required under 
physiological conditions, RNase H can also hydrolyze the RNA at many other locations, including regions in the 5' 

5 UTR. To further restrict the digestion to the initiation codon, conditions can be modified such that RNase H recognition 
requires hybridization of more than five base pairs (see Example 1). The AUG sequence is rare within the 5' UTR 
sequences (Kozak, Nucleic Acids Res. 15: 8125-48, 1987). Therefore, this RNase H digestion will preferentially result 
in intact, full-length 5' UTR sequences that are separated from the adjoining coding sequences. 
[0031] To enrich the population of 5' UTR-containing fragments within the mRNA sample, fragments of up to 1000 

10 nucleotides are selected using denaturing agarose gels. The 5' UTRs of most vertebrate mRNAs fall within the size 
range of 20-100 nucleotides (Kozak, supra). Subsequent to size selection, the mRNA sample is subjected to affinity 
purification using a recombinant elF4E fusion protein that interacts with the mRNA 5' cap structure (Sonenberg and 
Gingras, Curr. Opin. Cell Biol. 10: 268-75, 1998). 

[0032] An alternative strategy for isolating 5' UTRs from purified poly(A)+ RNA is to reverse transcribe the poly(A)+ 
'5 RNA using a degenerate {i.e., mixed-sequence) primer that hybridizes preferentially to the region surrounding and 
including the initiation codon (the 3' border of the 5' UTR). 

[0033] The consensus sequence surrounding the initiation codon of vertebrate mRNAs is GCC (G/A)CC AUG G 
(SEQ ID NO: 1). where the underlined sequence is the initiation codon, and the nucleotides in parentheses are found 
with nearly equal frequency at that position. 
20 [0034] A degenerate primer complementary to this consensus sequence can be designed that takes into account all 
the variations in frequency of the nucleotides at each position, so that the primer mixture has a high probability of 
hybridizing specifically to the initiation codon region. Table 2, below, shows that primers can be designed, based on 
the known sequences of hundreds of vertebrate mRNAs. 

25 Table 2. 



Designing Degenerate Primers for the Isolation of 5' UTRS 
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Data based on Kozak, Nucleic Acids Res. 15, 8125-8148, 1987 and Kozak, Gene 234:187-208, 1999. 



40 [0035] Referring to Table 2, the mixed-sequence primer reading 5' to 3' is complementary to the mRNA sequence 
surrounding the initiation codon. The numbering across the top from +4 through -6 corresponds to the numbering for 
the mRNA sequence, where position +1 is the first nucleotide of the initiation codon, and all the negative numbers refer 
to nucleotides in the 5' UTR. The percentages refer to the frequency of occurrence of a given nucleotide at a given 
position. Therefore, the primer would be synthesized such that, for example, at position 5, A occurs 1 0% of the time, 

45 C occurs 20% of the time, G occurs 55% of the time, and T occurs 15% of the time. Note that positions 2, 3, and 4 are 
invariant as they are complementary to the initiation codon, AUG. It is expected that a degenerate primer of the above 
composition would hybridize preferentially to the region of the mRNA surrounding and including the initiation codon. 
[0036] Following RT-PCR to generate a minus strand cDNA hybridized to mRNA, the heteroduplex can be isolated 
by affinity purification of the complex. The mRNA/cDNA hybrids are incubated with either a monoclonal antibody to 

so the 5' cap or a cap-binding protein, for example, an elF4E protein attached to a solid matrix, washed and eluted to 
enrich for RNAs containing the full 5' UTR. Following elution of the complex, the RNA is digested with RNase H and 
terminal transferase is used to label the 3' end of the cDNA with poly d(T). Poly d(A) is then be used to prime the 
second strand synthesis of the cDNA. The 5' UTR enriched library is then cloned 5' to the reporter gene. 
[0037] To construct the 3' UTR biased libraries, poly(A)+ RNA is digested, for example, using random primers and 

55 E. coli RNase H, followed by selection of poly(A)-containing fragments using oligo(dT)-linked resin. The isolated poly 
(A)-containing fragments are incubated with reverse transcriptase using oligo(dT) primers. Alternatively, to retrieve 
mRNA that is exclusively 3' UTR, isolated RNA is allowed to associate with ribosomes, for example, in lysates from 
rabbit reticulocytes. Under conditions in which ribosome run-off is inhibited by cycloheximide, reverse transcription is 
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performed in the presence of oligo(dT) and a low efficiency polymerase (see Example 2). 

[0038] The purified 5' and 3' UTR RNA fragments are subjected to 5' RACE (Rapid Amplification of cDNA Ends) to 
obtain double-stranded cDNA (Frohman, In: PCR Protocols: A Guide to Methods and Applications, pages 28-38, Eds: 
Innis et al., Academic Press, London). The 5' or 3' UTR cDNAs are then ligated into an expression vector of choice, 
5 for example, a retroviral vector. The 5' and 3' UTR sequences are positioned upstream or downstream, respectively, 
of a reporter gene's coding sequence. 

Screening Assay 

10 [0039] The expression vectors used for transfection of host cells each encode one UTR, in operative linkage to a 
promoter, linked to its UTR-associated first reporter gene. The vector also includes a second, different reporter gene 
that is operably linked to a promoter, but is not associated with the UTR. Expression of this UTR-independent second 
reporter gene is not regulated by any UTR effect. Thus, expression of this second reporter gene controls for differences 
in expression that result from variations in plasmid number or transcriptional efficiency. In addition, conditions can be 

is varied to reduce the number of different vectors, and, thus, the number of UTRs, that are introduced into each cell. To 
carry out host cell transfection, conditions are adopted to limit transfection, preferably, to less than 5 plasmids per cell, 
most preferably, to one plasmid per cell. Usually it is preferable to identify conditions that allow nearly clonal delivery 
of the vectors to the cells. For retroviral transduction methods, cells are infected at a multiplicity of infection (MOI) such 
that each cell is infected with approximately one virus. The MOI can be determined empirically for each cell line and 

so construct. Alternatively, plasmids can be delivered to cells via protoplast fusion (Tan and FrankeL Proc. Natl. Acad. 
Sci. 95:4247-52, 1998). For this method, E. col V are transformed with plasmid libraries, the bacteria cell walls are 
removed and the resulting protoplasts are fused to mammalian cells with polyethylene glycol. By adjusting the ratio of 
protoplasts to mammalian cells, plasmid delivery is reported to be nearly clonal, with individual cells containing 1000 
copies of a single plasmid. 

25 [0040] The choice of cell type to be used will depend on several factors, for example, the biological system of interest 
and the ease of foreign DNA transfection. Thus, if the biological system of interest is breast cancer-related genes, a 
breast cancer cell line may be used. In addition, given that retroviral transduction may be the only efficient means of 
transfection in some cell lines, use of these cells will not be preferred if another means of transfection is desired. 
[0041 ] Expression of the UTR-associated reporter gene will be compared to expression of the non-UTR associated 

30 second reporter gene. Any discrepancies in this ratio of expression could reflect UTR-mediated changes in mRNA 
translation, export, or stability. Many potential schemes for detecting expression, and identifying expression-altering 
UTRs are available. Particularly well-suited systems are those that produce a colored or otherwise detectable product 
as determined by gel electrophoresis, detection of fluorescence, chemiluminescence, or antibody binding. For example, 
cells that express such UTRs can be identified and isolated using a fluorescence activated cell sorter (FACS) and 

35 green fluorescent protein (GFP) as a reporter gene (Bierhuizen etal., Biochem. Biophys. Res. Commun. 234: 371-375, 
1997; Grignani et al., Cancer Res. 58: 14-19, 1998; de Martin et al., Gene Then 4: 493-495, 1997; Foster et al.. J. 
Virol. Methods 75: 151 -60, 1 998). Such a system is advantageous for high throughput screening. Other systems that 
can be used to track gene expression include detecting E. coli lacZ-encoded fi-galactosidase activity coupled with a 
fluorogenic substrate (Flering et al., Cytometry 12: 291-301 , 1991) and detecting the expression of foreign cell-surface 

40 antigens by means of fluorescently-labeled antibodies (Planelles et al., Gene Then 2: 369-76, 1995). 

[0042] In the case of detection by fluorescence, the emission spectra of the fluorophores used to track expression 
of the UTR-associated first reporter genes and non-UTR associated second reporter genes must be sufficiently different 
so that, for example, the FACS instrument can perform two-color analysis and sort cells on the basis of the correlation 
between expression of the two reporter genes. The transfected cell population will consist of four different expression 

45 patterns as follows: 1) cells that are negative for both gene markers, indicating transfection failure; 2) cells with a 
ratiometric relationship between expression of the UTR-linked gene and the control gene, indicating that the UTR has 
no effect on gene expression; 3) cells with disproportionately higher expression of the UTR-linked gene, indicating that 
the UTR enhances translation efficiency or mRNA stability; and 4) cells with disproportionately lower UTR-linked gene 
expression, indicating that the UTR reduces translation efficiency or mRNA stability. 

so [0043] Following FACS sorting, cells with skewed fluorescence signals can be collected for further analysis. The 
sequence of the expression-altering UTR can be determined using, for example, PCR with vector primers, or plasmid 
rescue. One fluorescent color readout is dependent upon levels of expression of the non UTR-linked second reporter 
gene and the other color is dependent upon the levels of expression of the UTR-linked gene. The FACS instrument is 
capable of determining the levels of expression of both colors simultaneously and plots the two levels for each individual 

55 cell versus each other. It is expected that most UTRs will not affect gene expression and therefore, a majority of the 
transfected cells should express a consistently proportional level of both gene products. This population of cells will 
occupy a characteristic region of the two color plot. Cells thatfall outside of this region will be automatically sorted into 
one of two tubes with UTR-linked genes that proportionally up-regulate gene expression in one tube and UTR-linked 
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genes that down-regulate gene expression in the other. 

[0044] A similar strategy can be used to screen and identify compounds that affect the function of the 5' and 3' UTR 
regulatory elements. Compounds that modulate the UTR effect on gene expression would skew the expression of the 
UTR-linked gene as compared to gene expression in the absence of the compound. 

5 

Example 1: Selective RNase H Digestion of mRNA 

[0045] Conditions for digestion can be adopted that prevent RNase H hydrolysis unless mRNA hybridization to the 
oligodeoxynucleotide probe encompasses more than 5 or 6 consecutive nucleotides. This was demonstrated in an 

10 experiment in which a 7-mer oligodeoxynucleotide was designed to hybridize to a control mRNA species at multiple 
locations, but to form no more than five consecutive DNA/RNA base pairs at any one of these locations. No hydrolysis 
occurred using this oligodeoxynucleotide, but it did occur using a partially degenerate oligodeoxynucleotide, NNCATNN 
(where N is an equimolar mixture of A, C, G, and T) which allowed hybridization of 6 or 7 consecutive base pairs (see 
Fig. 1). Following denaturation of 0.2 u.g control RNA (Promega luciferase control sequence) and 70 pmol oligodeox- 

15 ynucleotide in 10 mM Tris HCI, pH 8.0. 50 mM NaCI, at 70° Cfor 10 minutes, samples were submerged in ice. RNase 
H, MgCI 2 , and DTT were added to final concentrations of 0.4 units, 5 mM, and 1 mM, respectively. Samples were 
incubated at 20° C for 60 minutes. The reactions were terminated by the addition of EDTA to a final concentration of 
25 mM, and digestion products were separated and visualized on a 1% TBE non-denaturing agarose gel stained with 
ethidium bromide. 

20 [0046] Conditions for RNase digestion can also be controlled such that a sequencespecific oligodeoxynucleotide 
7-mer will mediate RNase H-catalyzed hydrolysis of RNA only at the single site where seven consecutive DNA/RNA 
base pairs can form (see Fig. 2). These conditions included denaturing 0.2 u,g control RNA (Promega luciferase control 
sequence) and 250 nmol oligodeoxynucleotide in 10 mM Tris HCI, pH 8.0, 50 mM NaCI, at 70 C for 10 minutes before 
submerging the samples in ice. Following the addition of RNase H, MgCI 2 , and DTT, as described above, and incubation 

25 at 20° C for 60 minutes, the digestion was terminated with the addition of EDTA to a final concentration of 25 mM. 
Digestion products were separated and visualized on a 6% polyacrylamide gel stained with ethidium bromide. 
[0047] A population of poly(A)+ RNA can be substituted for a control mRNA, and the poly(A)+ RNA can be partially 
hydrolyzed with a degenerate oligodeoxynucleotide, as shown in Fig. 3. Thus, under conditions that prevent formation 
of fewer than seven consecutive DNA/RNA base pairs for hydrolysis by RNase H. a partially degenerate oligodeoxy- 

30 nucleotide can be used in the reaction with poly(A)+ RNA, and the number of hydrolysis sites can still be limited, even 
after an extended incubation period. 

Example 2: Use of Ribosomes to Construct Full Length 3' UTRs 

35 [0048] Using reverse transcription and an oligo(dT) primer, a full length 3' UTR sequence can be copied to cDNA. 
Reverse transcription begins with the poly(A) region and proceeds upstream towards the 5' end of the 3' UTR. To 
terminate transcription at the coding sequence termination site, the mRNA is fully loaded with actively translating ri- 
bosomes which cause steric hindrance of the transcriptase. Given that ribosomes do not bind mRNA downstream of 
the termination codon, the reverse transcriptase proceeds unhindered to copy the entire 3' UTR sequence, but the 

40 activity of the reverse transcriptase is then terminated, effectively separating the full length 3' UTR from any upstream 
coding sequence (see Fig. 4). 

Other Embodiments 

45 [0049] All publications mentioned herein are hereby incorporated by reference. 



Claims 

so 1 . A cDNA library consisting essentially of at least 1 00 different cDNA sequences that correspond to different mRNA 
untranslated region (UTR) sequences isolated and separate from adjacent mRNA coding sequences. 

2. A cDNA library, wherein said library is constructed by steps comprising 

55 a) purifying poly(A)+ RNA from total RNA; 

b) performing controlled, non-random enzymatic digestion of AUG sequences in the poly(A)+ RNA; 

c) purifying said digested RNA to obtain the fragments containing the 5' end sequences; and 

d) synthesizing cDNA from the purified RNA obtained in step (c); 
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wherein said library consists essentially of cDNA sequences corresponding to mRNA 5' untranslated region 
(UTR) sequences, isolated and separate from adjacent mRNA coding sequences. 

3. The cDNA library of claim 2, wherein said enzymatic digestion is carried out using RNase H. 

5 

4. A cDNA library, wherein said library is constructed by steps comprising 

a) purifying poly(A)+ RNA from total RNA; 

b) synthesizing nucleic acid heteroduplexes from said poly(A)+ RNA using degenerate primers that hybridize 
10 to the region surrounding and including the initiation codon, said heteroduplexes comprising the 5' end se- 
quences of said RNA; 

c) purifying the heteroduplexes obtained in step (b) to obtain the fragments containing the 5' end sequences; 
and 

d) synthesizing cDNAfrom the purified heteroduplexes obtained in step (c); 

15 

wherein said library consists essentially of cDNA sequences corresponding to mRNA 5' untranslated (UTR) 
sequences, isolated and separate from adjacent mRNA coding sequences. 

5. The cDNA library of claim 2 or 4, wherein said 5' sequence purification is carried out using a cap binding protein. 

20 

6. A cDNA library, wherein said library is constructed by the steps comprising 

a) purifying poly(A)+ RNA from total RNA; 

b) performing random digestion on the poly(A)+ RNA; 

25 c) purifying said digested RNA to obtain poly(A) containing fragments; and 

d) synthesizing cDNAfrom the purified RNA obtained in step (c); 

wherein said library consists essentially of cDNA sequences corresponding to 3' UTR sequences, isolated 
and separate from adjacent mRNA coding sequences. 

30 

7. A cDNA library, wherein said library is constructed by steps comprising 

a) purifying poly(A)+ RNA from total RNA; 

b) loading said poly(A)+ RNA with ribosomes; and 

35 c) performing reverse transcription on said loaded poly(A)+ RNA using an oligo(dT) primer and polymerase; 

wherein said library consists essentially of cDNA sequences corresponding to 3' UTR sequences, isolated 
and separate from adjacent mRNA coding sequences. 

40 8. The cDNA library of claim 1 ,2,4 wherein said cDNA sequences are cloned into a vector system that can express 
said sequences. 

9. The cDNA library of claim 1 , 2, 4, 6 or 7 wherein said UTR sequences are isolated in intact form. 
45 10. A vector comprising 

a) a nucleotide sequence encoding an mRNA UTR sequence in operative linkage to a promoter, wherein said 
nucleotide sequence is derived from the cDNA library of claim 1 , 2, 4, 6 or 7; 

b) a first reporter gene positioned for transcription upstream or downstream of said UTR-encoding nucleotide 
so sequence; and 

c) a second, different reporter gene in operative linkage to a promoter but unassociated with said UTR-encod- 
ing nucleotide sequence. 

11. The vector of claim 1 0 wherein said reporter genes encode a fluorescent protein or cell surface marker protein. 

55 

12. A method of identifying a regulatory UTR sequence, said method comprising 

a) transfecting a plurality of host cells with a plurality of vectors of claim 1 1 , wherein said host cells are trans- 
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fected with different UTR sequences; 

b) sorting cells on the basis of the ratio between expression of the first reporter gene and the second reporter 
gene; 

c) identifying the cells of step (a) that have skewed expression ratios as compared to the population of cells 
of step (a) as a whole, or as compared to cells transfected with a vector that encodes the first and second 
reporter gene, but lacks the corresponding UTR sequence; and 

d) sequencing the UTR expressed in said identified cells. 

13. The method of claim 12 wherein said gene expression is detected by emission of fluorescence. 

14. The method of claim 13 wherein said cells are sorted by a fluorescence activated cell sorter. 

15. A cell transfected with the vector of claim 10. 
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FIG. 2 
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FIG, 3 
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FIG- 4 
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Application Number 



The Search Division considers that the present European patent application does not comply with the 
requirements of unity of invention and relates to several inventions or groups of inventions, namely: 

1. Claims: 1,8-15 (partially) and 2,3,4,5 (complete) 

A cDNA library consisting essentially of mRNA 5' 
untranslated regions and which is constructed by (i) 
purifying poly(A)+ RNA from total RNA; performing controlled 
non-random digestion of AUG sequences in the poly(A)+ RNA; 
purifying said digested RNA to obtain fragments containing 
the 5' end sequences and synthesizing cDNA from the purified 
RNA or (ii) purifying poly(A)+ RNA from total RNA, 
synthesizing nucleic acid heteroduplexes from said poly(A)+ 
RNA using degenerate primers, purifying said heteroduplexes; 
and synthesizing cDNA from the purified heteroduplexes; said 
cDNA library cloned in a vector system; said cDNA library 
wherein said UTR sequences are isolated in intact form; a 
bicistronic vector comprising an mRNA UTR in operative 
linkage to a promoter, wherein said nuclelotide sequence 1s 
derived from the cDNA library, a first gene associated to 
said UTR and a second gene unassociated to said UTR; a 
method to identify a regulatory UTR sequence by using the 
bicistronic vector. 

2. Claims: 1,8-15 (partially) and 6 (complete) 

As invention 1 but relating to a cDNA library which 
esentially consists of mRNA 3' untranslated regions, said 
library being obtained by purifying poly(A)+ RNA from total 
RNA, performing random digestion on the poly(A)+ RNA; 
purifying said digested RNA to obtain poly(A) containing 
fragments and synthesizing cDNA from the purified RNA 
previously obtained. 

3. Claims: 1,8-15 (partially) and 7 (compTete) 

As invention 2 but relating to a cDNA library obtained by 
purifying poly(A)+ RNA from total RNA, loading said poly(A)+ 
RNA on ribosomes; performing reverse transcription on said 
poly(A)+ RNA using an oligo(dT) primer. 



16 



EP 1 176 196 A1 



ANNEX TO THE EUROPEAN SEARCH REPORT 

ON EUROPEAN PATENT APPLICATION NO. EP 00 11 5854 



This annex lists the patent family members relating to the patent documents cited in the above-mentioned European search report. 
The members are as contained in the European Patent Office EDP file on 

The European Patent Office is in no way liable tor these p articulars which are merely given for the purpose of information. 

12-97-2001 



Patent document 
cited in search report 


Publication 
date 




Patent family 
member(s) 


Publication 
dale 


WO 98555G2 


A 


10-12-1998 


EP 


0988313 A 


29-03-2000 






US 


6187544 B 


13-02-2001 


US 6083727 


A 


04-07-2006 


AU 


5753400 A 


31-01-2001 






WO 


0100820 A 


04-01-2001 


WO 9423041 


A 


13-10-1994 


AU 


6497694 A 


24-10-1994 






CA 


2159639 A 


13-10-1994 








EP 


0693126 A 


24-01-1996 








JP 


8509213 T 


01-10-1996 








US 


5738985 A 


14-04-1998 | 








us 


6156496 A 


05-12-2000 





For more details about this annex : see Official Journal of the European Patent Office, No. 1 2/82 



17 



